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■ Abstract 

a" 

We study simple mathematical models of gene expression to explore the possible 
origins of haploinsufficiency (HI). In a diploid organism, each gene exists in two 
copies and when one of these is mutated, the amount of proteins synthesized is 
reduced and may fall below a threshold level for the onset of some desired activity. 
This can give rise to HI, a manifestation of which is in the form of a disease. We 
consider both deterministic and stochastic models of gene expression and suggest 
possible scenarios for the occurrence of HI in the two cases. In the stochastic case, 
random fluctuations around the mean protein level give rise to a finite probability 
that the protein level falls below a threshold. Increased gene copy number and 
faster gene expression kinetics reduce the variance around the mean protein level. 
The difference between slow and fast gene expression kinetics, as regards response 
to a signaling gradient, is further pointed out. The majority of results reported in 
the paper are derived analytically. 

^ ■ PACS: 05.10.Gg, 82.30.-k, 87.10.+e, 87.15.Aa 
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I. Introduction 

Complex multicellular organisms are in general diploids, i.e., each cell in an organism 
contains two copies of the full set of genes in contrast to haploids in which each cell 
contains a single copy of the genome. Genes provide the blueprint for the synthesis of 
proteins which perform essential functions in cells. If one copy of a gene is mutated, 
there is approximately a 50% reduction in the level of proteins synthesized. In many 
cases this does not lead to observable changes and normalcy is retained. A common 
interpretation of haploinsufficiency (HI) is that it occurs when half normal levels of 
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proteins are insufficient for completing particular tasks, leading to specific types of dis- 
eases. More generally, HI may occur when the level of proteins synthesized falls below 
a critical level for the onset of some desired activity. 

There is presently an extensive literature on the genetic and biomedical aspects of 
HI IHJEH3J but mathematical models exploring the origin of HI and the issues related to 
it are practically non-existent. It is by now well-accepted that stochastic processes have 
considerable effect on patterns of gene expression in cells L4| |6j |Zl - Cook et al |3[ have 
studied the role of stochastic gene expression in HI by constructing a minimal model 
of gene expression and using numerical techniques to simulate the model. Their major 
finding is that when one of the two genes in a diploid organism is inactivated due to 
mutations, there is an increased susceptibility to stochastic initiations and interruptions 
of gene expression. As a result, the number of proteins produced may transiently fall 
below the desired level giving rise to HI. Both increased gene copy number and faster 
gene expression kinetics reduce expression noise, thus enhancing the possibility of a 
stable outcome. 

A large number of diseases are caused by mutations in genes encoding proteins 
called transcription factors (TF). More than 30 different human maladies have been at- 
tributed to TF HI TFs regulate gene expression by binding at the promoter region 
of the gene to be expressed. Cooperative interactions among the TFs favour the forma- 
tion of bound TF complexes (oligomers). The TFs interact at only one site or at multiple 
sites of the promoter. A simple mathematical model has been proposed to explore HI 
in systems involving cooperative assembly of TFs Q- Such multimeric complexes are 
essential for initiation of gene expression in many eukaryotic systems. The model ex- 
plores the relationship of fractional oligomerization Y with the free ({S}) as well as total 
concentrations of TFs ([So]). The TFs oligomerise to form a bound complex. The curves 
Y versus [S] and [S ] have sigmoidal shapes. Due to the characteristic S shape of a 
sigmoid, a small change in the TF concentration around the inflection point (the point 
at which the tangent to the curve has the maximum slope) gives rise to a significant 
change in the magnitude of Y. Thus, if there are two TF-encoding genes and one of 
these becomes silent, the level of TFs produced may fall below the inflection point of 
the sigmoid and consequently the magnitude of Y, the fractional oligomerization, is 
considerably decreased. This results in reduced expression from the target gene, giving 
rise to TF HI if the amount of proteins synthesized falls below a threshold level. 

In Section 2 of this paper, we extend the minimal model of Cook et al |3| to inves- 
tigate the influence of bound complexes of TFs on the initiation of gene expression. In 
Section 3, we study the stochastic version of the minimal model and its extensions to 
elucidate the role of stochasticity in HI. We derive analytical expressions for the quanti- 
ties determined numerically by Cook et al 151. 
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2. Deterministic Model 



The model is an extension of the minimal model of gene expression studied by Cook 
et al Q. A brief description of the model is as follows. A gene can be in two possible 
states: inactive (G) and active (G*). Random transtions occur between the states G and 
G* according to the first order reaction kinetics 

k a jp kp 

G # G* ► p > $ (1) 

k d 

where k a and kd are the activation and deactivation rate constants. The corresponding 
half-times are T a = ^ and T d = respectively. In the active state G*, the gene 
synthesizes a protein (p) with the rate constant j p . The protein product degrades with a 
rate constant k p and the associated half-time is T p . The protein degradation product is 
represented as $. We now assume that activation to the state G* is brought about by an 
inducing stimulus S, e.g., TFs. The reaction scheme in the presence of the stimulus is 
given by 

k\ k a jp kp 

G + S # GS # G* — > p — y $ (2) 
h k d 

where GS represents the bound complex of G and S from which transition to the active 
state G* occurs. If no is the total concentration of genes then 

n G = [G] + [GS] + [G*] (3) 

where [G], [GS] and [G*] denote the concentrations of genes in the states G, GS and G* 
respectively. In the steady state, we have 



<I( ' A - k 2 [GS] - h[G][S] =0 



dt 
so that 

HP = [GS] (4) 
K s 

where K s — |^ is the equilibrium dissociation constant. From and (HJ), we get 

|GS| = TT]si7i?:- |G| rns]77f: (5) 

Also, in the steady state, 

^l = k a [GS}-k d [G*} = (6) 
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From ^ and ©, the expression for [G*] in the steady state is given by 



n Gka 1+ js]/K 

S 

■■1+[S]/K S 



The reaction scheme in ||TJ| leads to the expression 

in the steady state. Expression and © are equivalent on defining effective 
activation and deactivation rate constants: 

k'-k [S]/Ks 



[S]/K s > 

k' d = k d (9) 

We now assume the inducing stimulus to be TFs. In the simplest approximation, n 
individual TFs oligomerise to produce an active complex S n according to the reaction 
scheme 

K 

nS # S n (10) 

The n TFs interact all at once to give rise to the bound complex [S n ], i.e., we ignore the 
formation of dimers, tetramers, etc. Let [So] be the initial concentration of TFs. Then 

[S ] = [S]+n [S n ] (11) 

where [S] and [S n ] are the concentrations of free TFs and the bound TF-complex 
respectively. The global equilibrium constant K is given by 

ry- __ [Sn] _ [Sn] 2 \ 

[5]" ([S ]-n[S n ]r K ] 

The fractional oligomerization is defined as |2J 

Y _ n [Sn] n[S n ] 

[S ] " [S]+n[S n ] K ) 

Using (|TTJ) and ((12)) , Y can further be written as 



jft + [S] n 

and 



Y = isr^ d4) 
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o 

Figure 1: Fractional oligomerization Y versus [So] (Eq. (IT5t ) for n = 6, K = 2 and 

[S„] = 0.2 

Fig.l shows the curve, fractional oligomerization Y versus [So] ( Eq. dl5t ) for 
n = 6, K = 2 and [S n ] = 0.2. The curve has the well-known sigmoidal shape. 

We now replace [S] by [S n ] in the reaction scheme described by Eq. (|2jl, i.e., we 
assume that the TF-oligomer S n binds to a gene in the inactive state G to give rise to the 
bound complex GS n . Transition to the active state G* occurs from the intermediate state 
GS n . The concentration [G*] in the steady state is obtained from (0) by replacing [S] by 
[S n ] where [S n ] = K [S] n ( Eq. CE3 )• One finally obtains 



n G k a 



K [S} n /K s 



[G J ~ /, Fgjy^ ,~7~ ^ 



The concentration of proteins in the steady state is given by 



From l(T4)> , [5] n can be written as 



l-Y Kn 

From (fT6l and ((T7)l and for fc a = fc^, we get 



[p] = 3 f [G*] (17) 

ftp 



[S] n = t^7 #: (18) 



n G T r—TTT? 77 ( 19 ) 



k p 1 + 2aY - y 



where a 



Fig. 2 shows the protein concentration [p] versus fractional oligomerization Y for 

k a = k d (Eq.dl9t), n G = 2, j p = 0.5 k p , K s = 1.2 and n = 6. From Eqs. (jTTJ), dT6t . and $\7\ 
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Figure 2: Protein concentration [p] versus fractional oligomerization Y for k a = kd (Eq. 

CS)/ ™g = 2, jp = 0.5 fc p/ K s = 1.2 and n = 6. 




Figure 3: Protein concentration [p] versus [S ] (Eq. d20l)for n = 6, K 

[S n ] = 0.2 



2, K s = 1.2 and 



and for k a = kd, the concentration of protein \p], as a function of the total concentration 
[S'o] of TFs, can be written as 



[p] = n G 



j p ([S ]-n[S n ]r 
k p 2([S ]-n[S n }) n + K s /K 



(20) 



Fig. 3 shows the plot [p] versus [S ] (Eq. f or n = 6, K = 2, K s = 1.2 and [5 n ] = 0.2. 
Figs. 1, 2 and 3 provide a possible explanation for the origin of HI. Suppose two 
TF-encoding genes produce TFs of total concentration [S'o] = 4. If one of the genes is 
inactivated due to mutations, the total concentration [S ] falls to the value 2. For the 
parameter values corresponding to Fig. 1, the fractional oligomerization Y has the 
value 0.797. The TFs form a bound complex S n (n = 6) which then activates the gene 
synthesizing the protein P. The concentration of proteins [p] corresponding to 
Y = 0.797 and for [S] = 0.8 is given by [p] = 0.233 (Fig. 2). The same value [p] is 
obtained from Fig. 3 with [S ] = 2. If both the encoding genes are active, the values of 
[S ], Y and P are [S ] = 4, Y — 1.0 and [p] = 0.5 respectively. Thus, for one gene, the 
protein level is reduced by more than half. If the level falls below a threshold, the 
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amount of proteins synthesized is not sufficient for the execution of a particular task. 
This gives rise to HI, the manifestation of which is in the form of a disease. 



3. Stochastic Approach 

Let us first consider the model described in Section 2 in the absence of an inducing 
stimulus. Let n tot be the total no of genes and n , n\ (n tot = n Q +ni), the number of genes 
in the inactive (G) and active (G*) states respectively. In the stochastic model, a gene 
makes random transitions between the inactive and active states with k a and kd being 
the activation and deactivation rate constants. In the active state, protein production 
and degradation occur with the rate constants j p and k p respectively. Let p(rii,n 2 ,t) 
be the probability that at time t, n\ genes are in the active state G* and the number of 
protein molecules is n 2 . The rate of change of the probability with respect to time is 
given by the Master Equation 

dp(ni,n 2 ,t) = fc a [( ntot _ ni _|_ l)p(m - 1, n 2 , t) - (n tot - rii)p(ni, n 2 , t)] 

+k d [(n l + l)p(ni + 1, n 2 , t) - n 1 p(n 1 ,n2, t)} ^i) 

+j P [n>ip{ni, n 2 - 1, t) - n x p(n 1: n 2: t)] 
+k p [(n 2 + l)p(ni, n 2 + 1, t) - n 2 p(nx, n 2 , t)) 

For each rate constant, there is a gain term which adds to the probability and a loss 
term which subtracts from the probability. 

We now use the standard approach in the theory of stochastic processes 1 8 [ to deter- 
mine the average number of activated genes < n\ > and proteins < n 2 > in the steady 
state and the variances thereof. Define the generating function 

F{z u z 2 , t) = £ z?v{n u Tia, t) (22) 

n\,n2 

In terms of the generating function, the Master equation <|2T|> becomes 

dF dF dF dF dF 
-rrr = kantat^zi-VjF-ka^Zx-VjZx- k d (zi-\)- — hj p (22-l)^i^ k p (z 2 -\)— (23) 

ut OZ\ OZ\ UZ\ OZ 2 

In the steady state ^ = 0. The following properties of the generating function are used 
in subsequent calculations: 

F L =M2= i= 1 (24) 

< n l >= 1 21=1,23=1) < n 2>= T— |21=1,2 2 =1 (25) 

UZ\ <JZ 2 

where < n\ > is the mean number of active genes, i.e., genes in the state G* and < n 2 > 
is the same for proteins. Furthermore, 



g£ | Z1=M2=1 =< n\ > - < n x > 

| zl=V2= i=< n 2 2 > - < n 2 > 1 V 
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Hence the variances around the mean levels are given by 



Var ni =< n\ > - < n x > 2 = \ Zl =i, Z2 =i + < n x > - < n x > 2 
Var n2 =< n\ > - < n 2 > 2 = \ Zl =i,z 2 =i + < n 2 > - < n 2 > 2 



Successive differentiation of Eq. d23b = 0.) with respect to z\ and z 2 gives rise to 
linear equations for successively higher moments. The equations may be solved to 
obtain, in particular, the mean and the variance. For example, differentiating Eq. d23t 
with respect to z\ and z 2 and then putting z\, z 2 — 1, one obtains expressions for the 
mean. 

The mean and variance are given by 

TT'tot k a /r»n\ 

K ni >= VTIZ (28) 

Var ni =< nx > (29) 
< n 2 >=< p >=< n 1 > jr = jr , , (30) 

kp [k a + k(i){ka + kd + kp) 
As in Ref. 3, temporal quantities are scaled relative to the product half-life T p = 

Let T a = l -^=- and Td = be the times for half-maximal gene activation and 
deactivation respectively. The times T a and T d are scaled relative to T p . Some of the 
results obtained in Ref. 3, using numerical simulation techniques, can readily be 
derived from the analytical expressions in d28l- (l3Tl) . Stochasticity introduces random 
fluctuations around the mean protein level and variance gives a measure of the spread. 
Let T a = T d = T p /A, i.e., k a = k d = ak p with a > 0. As a increases, one has faster 
expression kinetics and from ((31]) it is easy to verify that variance is reduced, i.e., the 
expression noise is less. The mean product level (Eq. ((30)) ) is, however, independent of 
a. With increase in j p , i.e., the protein synthesis rate, the variance increases. Let us now 
consider the case when the net expression rate of n tot genes is distributed to one single 
gene so that the mean protein level remains the same. From d30t 

<U2> =Y ! ¥ = H (32) 

where j = j p n tot is the expression rate when only one gene is considered. Since 
jp > j p , the gene copy number is reduced from n tot to 1 . Similarly, when the net 
expression rate of n tot genes is distributed to a larger number genes, say, from two to 
four, the variance is reduced.When one of two genes is inactivated due to mutations, 
the average protein level in the steady state is reduced by 50%. This may still be higher 
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than the threshold level required for protein activity. Due to the variance around the 
mean level, the number of proteins may transiently fall below the threshold giving rise 
to HI. The occurrence of HI further becomes more probable for slower expression 
kinetics as then the variance is increased in magnitude. For stochastic gene expression 
in the presence of an inducing stimulus, say, TF's, we use the effective model with the 
activation/ deactivation rate constants given in Eq. ©. The expressions for the mean 
and the variance are the same as in Eqs. d28b-d3Tl but with k a/ k d replaced by k' a and k' d 
respectively. 

We now derive expressions for the probability distributions of protein levels in the 
steady state. To do this, we consider a simpler stochastic model in which the only 
stochasticity arises from the random transitions of a gene between the inactive and ac- 
tive states. In each state of the gene, the concentration of proteins evolves deterministi- 
cally according to the equation 

z — k p x = f(x,z) (33) 



dt X, 



max 



where z — 1 (0) when the gene is in the active (inactive) state and x = v x , X and 
X m ax being the protein concentration at time t and the maximum protein concentration 
respectively. We note that X max = jf-. Let Pj(x, t) (J = 0, 1) be the probability density 
function when z = j. The total probability density function is 

p(x,t) = p (x,t) +p 1 (x,t) (34) 

The rate of change of probability density is given by 

^&t^ = -■feVMte&W + T,\WkjPk{x,t) - W jkPj (x,t)} (35) 

where W k j is the transition rate from the state k to the state j and Wjk is the same for 
the reverse transition. The first term in Eq. d35b is the so called "transport" term 
representing the net flow of the probability density. The second term represents the 
gain/loss in the probability density due to random transitions between the state j and 
other accessible states. In the present case, Eq. d35t gives rise to the following two 
equations: 

dpo (x. t) 

— dt~ = ~fa(~ k P xp °( x,t ^ + kdPi(x>t) - k a p (x,t) (36) 
dVl< f+ ~ = ~7r{(^ k p x)p 1 (x 1 t)} + k a p (x,t) - k d px(x,t) (37) 

Ot OX sv max 

The Master equation (Eq. ((2T]) ) provides a full stochastic description of all the processes 
associated with gene expression, namely, gene activation and deactivation, protein 
synthesis and degradation. The only stochastic events considered by Cook et al are 
those related to gene activation and deactivation. In their model, protein synthesis 
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Figure 4: p(x) versus x for slow (dotted curve) and fast (solid) gene expression kinetics 

from the active gene and protein degradation occur in a deterministic manner. Eqs. 
d36b - d37t describe the scenario studied by Cook et al. The steady state distribution in 
this case is given by 

p(i) = C^ 1) (l-i) ( ^ 1) (38) 
where C the normalization constant is given by the inverse of a beta function. 
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Since the probability density function is known, the mean protein level and its variance 
can be calculated in a straightforward manner. The mean protein level is identical to 
that obtained from the Master equation (Eq. ((2T|> ) whereas the variance is 
underestimated as stochasticity is taken into account only at the levels of gene 
activation and deactivation. Fig. 4 shows the plot of p(x) versus x for slow 
(T a = T d = T p /A) and fast (T a = T d = T p /A0) gene expression kinetics. In the latter case, 
the distribution is significantly narrower, i.e., faster kinetics lead to a reduction in the 
variance. The same conclusion is reached from the Master equation approach. From 
the full width at half maximum of the broader distribution, one finds that x ranges 
from 0.25 to 0.75, 0.5 being the mean value. Thus, it is probable that the protein level 
falls below the threshold for desired activity giving rise to HI. Let x t hr (< 1) be the 
threshold value of x. The probability that x is greater than x t hr is 

> **) = 1 " /o ; 1 ' I „ <*> 

JoV^ ^(l-x)^ l) dx 

kg 

kpx% 2F 1 [l-^,^,l+^,X thr ] 

_ y thr kp'kp' k p ' tflr ' (41) 

kaB(tp^ P ) 

where 2 F± (a, b, c; z) is the hypergeometric function. 
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Figure 5: p(x > x t hr) versus S/K s in a semi-logarithm plot for n 
curve corresponds to fast (slow) kinetics. 
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Figure 6: p(x > x thr ) versus S/K' s [(K' s ) n = K s /K] in a semi-logarithm plot for n 
The solid (dotted) curve corresponds to fast (slow) kinetics. 



Let Xthr be 0.25. The probability p(x > x thr ) is computed using Mathematica for both 
slow and fast gene expression kinetics. The values of p(x > x t hr) in the slow and fast 
cases are 0.9294 and 0.9999 respectively. Since in the latter case, the probability that the 
protein level exceeds the threshold is higher, the chance of HI occurrence is 
correspondingly lower. 

In the presence of an inducing stimulus, say, TFs, the probability of activation above 
a threshold is again given by (|40)l with k a and k d replaced by k' a and k' d (Eq. ||9jl) . Fig. 
5 shows p(x > Xthr), Xthr = 0.25, versus S/K s in a semi-logarithm plot for both slow 
and fast kinetics. In the fast case, a substantially steeper curve is obtained leading to en- 
hanced signal discrimination, i.e., a more predictable response in a gradient of inducing 
signal. As shown by Cook et al |3[, the signal discrimination ability increases with gene 
copy number. One can thus speculate that diploid organisms utilise stochastic expres- 
sion kinetics, preferably fast, for signal discrimination and are susceptible to degraded 
signal discrimination due to a reduction of gene copy number in the haploid state. Mu- 
tations in the subset of genes which generate a response to signaling gradients in diploid 
organisms may be the cause of some HI syndromes associated with these systems. Fig. 
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6 shows the same plot as in Fig. 5 but now the TF's form bound complexes with n = 6. 
In Eq. ®, [S] is replaced by [3 n ] = K[S] n ( Eq. 02)). One now finds that the slopes of 
the curves for slow and fast kinetics are similar. Thus as n, the number of TF's forming 
the bound complex, increases, the distinction between slow and fast gene expression 
kinetics, as regards their signal discrimination ability, becomes less pronounced. 

4. Summary 

In this paper, we have studied simple mathematical models to explore the possible ori- 
gins of HI. In Section 2, we have considered a deterministic model in which a complex 
of n TFs binds at the appropriate region of DNA to initiate gene expression in eukary- 
otes. The concentration of proteins synthesized versus the total concentration of TFs 
(Fig. 3) is a sigmoid. Due to the S-shape of the curve, the protein level may fall below a 
threshold when one of the two genes synthesizing the TFs is mutated, resulting in a 50% 
reduction in the total concentration of TFs. The absence of required protein activity can 
give rise to HI. In Section 3, we have studied simple stochastic models of gene expres- 
sion and shown that due to random fluctuations around the mean protein concentration 
in the steady state, the protein level may fall below the threshold even though it does 
not do so in the deterministic case. The variance, a measure of the spread around the 
mean protein level, is reduced with increasing gene copy number and faster expression 
kinetics. The variance increases if the rate constant j p associated with protein synthesis 
is increased. In the case of one gene, we have further calculated the probability that the 
concentration of proteins exceeds a threshold in the absence as well as the presence of 
an inducing stimulus. In the latter case, faster gene expression kinetics give rise to a 
sharper response to changing stimulus concentrations. As shown by Cook et al [3J, this 
is also true when the gene copy number is increased. Thus the signal discrimination 
ability of diploid organisms may be impaired in the haploid state. When the inducing 
stimulus is a bound complex of n TFs, the distinction between slow and fast gene ex- 
pression kinetics becomes less with increasing n. To sum up, we have considered both 
deterministic as well as stochastic models of gene expression and indicated possible 
scenarios for the occurrence of HI. 
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